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Abstract 

We use persistent homology to build a quantitative understanding of large complex systems that are driven far-from- 
equilibrium; in particular, we analyze image time series of flow field patterns from numerical simulations of two important 
problems in fluid dynamics: Kolmogorov flow and Rayleigh-Benard convection. For each image we compute a persistence 
diagram to yield a reduced description of the flow field; by applying different metrics to the space of persistence diagrams, 
we relate characteristic features in persistence diagrams to the geometry of the corresponding flow patterns. We also 
examine the dynamics of the flow patterns by a second application of persistent homology to the time series of persistence 
diagrams. We demonstrate that persistent homology provides an effective method both for quotienting out symmetries in 
families of solutions and for identifying multiscale recurrent dynamics. Our approach is quite general and it is anticipated 
to be applicable to a broad range of open problems exhibiting complex spatio-temporal behavior. 


1. Introduction 

We introduce new mathematical techniques for ana¬ 
lyzing complex spatiotemporal nonlinear dynamics and 
demonstrate their efficacy in problems from two differ¬ 
ent paradigms in hydrodynamics. Our approach em¬ 
ploys methods from algebraic topology; earlier efforts 
have shown that computing the homology of topological 
spaces associated to scalar or vector fields generated by 
complex systems can provide new insights into dynamics 
pa mail 13 0. We extend prior work by using a relatively 
new tool called persistent homology min]. 

Complex spatiotemporal systems often exhibit compli¬ 
cated pattern evolution. The patterns are given by scalar 
or vector fields representing the state of the system un¬ 
der study. Persistent homology can be viewed as a map 
PD that assigns to every field a collection of points in 
called a persistence diagram. For a given scalar field 
f : D —i R, the points in the persistence diagram PD(/) 
encode geometric features of the sub-level sets C{f,6) = 
{x G D \ f{x) < 9} for all values of 9. A feature encoded 
by the point {9i,,9d) G PD(/) appears in C{f,9b) for the 
first time and disappears in C(/, 9d)). Therefore, 9b and 9d 
are called birth and death coordinates of this feature. The 
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lifespan 9d—9b > 0 indicates the prominence of the feature. 
In particular, features with long lifespans are considered 
important and features with short lifespans are often as¬ 
sociated with noise. Thus, the persistence diagram is a 
highly simplified representation of the field generating the 
pattern. 

The space of all persistence diagrams, Per, can be en¬ 
dowed with a variety of metrics under which PD is a con¬ 
tinuous function. This has several important implications 
that we exploit in this paper. First, continuity implies that 
small changes in the field pattern, e.g. bounded errors as¬ 
sociated with measurements or numerical approximations, 
lead to small changes in the persistence diagrams. Second, 
by using different metrics, we can vary our focus of interest 
between larger and smaller changes in the persistence dia¬ 
grams. Moreover, by comparing different metrics, we can 
infer if the changes in a pattern affect geometric features 
with longer or shorter life spans. Finally, since, applying 
the map PD to a time series of patterns produces a time 
series in Per, the distance between the consecutive data 
points in Per can be used to quantify the average rate at 
which the geometry of the patterns is changing. 

As mentioned above, the dynamics of spatiotemporal 
systems are characterized by the time-evolution of the pat¬ 
terns corresponding to the fields generated by the system. 
However, capturing these vector fields, either experimen¬ 
tally or numerically, results in multi-scale high dimensional 
data sets. In order to efficiently analyze these data sets, 
a dimension reduction must be performed. We use persis¬ 
tent homology to perform nonlinear dimension reduction 
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from a time series of patterns to a time series of persis¬ 
tence diagrams. We show that this reduction can cope 
with redundancies introduced by symmetries (both dis¬ 
crete and continuous) present in the system. In particu¬ 
lar, this approach directly quotients out symmetries and, 
thereby, permits easy identification of solutions that lie on 
a group orbit. 

Separately, we also apply persistent homology to extract 
information about dynamical structures in the reduced 
data. Characterizing dynamics in the space of persistence 
diagrams cannot be done using conventional methods (e.g., 
time delay embeddings), since choosing a coordinate sys¬ 
tem in Per is currently an open problem. However, since 
Per is a metric space, the geometry of the point cloud X, 
generated by the time series of the reduced data, is en¬ 
coded by a scalar field which assigns to each point in Per 
its distance to X. We show how persistent homology may 
be applied to describe dynamics by characterizing the ge¬ 
ometry of X. 

An outline of the paper is as follows. In Section we 
present a brief overview of the two fluid flows examined in 
this paper: (1) Kolmogorov flow and (2) Rayleigh-Benard 
convection. We note here, for emphasis, that while per¬ 
sistent homology can be applied to vector fields, it will be 
sufficient for this paper to focus on scalar fields drawn from 
these systems (specifically, one component of the vorticity 
field for Kolmogorov flow, and the temperature field for 
Rayleigh-Benard convection). 

In Section we discuss key issues related to the appli¬ 
cation of persistent homology. By now, the mathematical 
theory of persistent homology is well developed. There¬ 
fore, our main emphasis is on the computational aspect of 
passing from the data to the persistence diagrams. Sec¬ 
tion [^describes the correspondence between the geometric 
features of a scalar field and the points in its corresponding 
persistence diagram. Section discusses the structure of 
the space Per and the properties of the associated metrics. 

In Sections i and [3 we discuss how these metrics can 
be used to analyze dynamics. First, we interpret distance 
between the persistence diagrams representing the consec¬ 
utive data points in the time series as a rate at which geom¬ 
etry of the corresponding scalar fields is changing. Second, 
we motivate and explain the procedure for extracting the 
geometric structure of the point cloud in Per. 

We close the paper by applying the developed tech¬ 
niques to the following problems. In Section]^ we identify 
distinct classes of symmetry-related equilibria for Kolo- 
mogorov flow. In Section we show that a relative peri¬ 
odic orbit for Kolmogorov flow collapses to a closed loop 
in Per. Finally, in Section we deal with identifying re¬ 
current dynamics that occur on different time scales in our 
study of Rayleigh-Benard convection flow. 


2. The Systems to be Studied 

2.1. Kolmogorov Flow 

For the study of turbulence in two dimensions, Kol¬ 
mogorov proposed a model flow where the two-dimensional 
(2D) velocity field u(a:, y, t) is given by 
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-I- /3u • Vu = —Vp -I-— ou -I- f (1) 

ot p 

V • u = 0 

(with j5 = 1 and a = 0), where p(x,y) is the pressure 
field, V is the kinematic viscosity, p is fluid density, and 
f = X sin{Ky)x is the forcing that drives the flow [10]. Lab¬ 
oratory experiments in electromagnetically-driven shallow 
layers of electrolyte can exhibit flow dynamics that are 
well-described by Equations Q with appropriate choices 
of P and a to capture three-dimensional effects, which are 
commonly present in experiments m- In this paper, we 
refer to all models described by Equations Q (including 
experimentally-realistic versions) as Kolmogorov flows. 

It is convenient to use the vorticity-stream function for¬ 
mulation |12j to study Kolmogorov flow analytically and 
numerically. Equations Q, written in terms of the z- 
component of the vorticity field a; = (V x u) • k, a scalar 
field, take the form 

doj 

— + /3u • Vw = — auj + x^cos^Ky). (2) 

For the current study, we choose /3 = 0.83, = 3.26 x 

10“® m^/s, a = 0.063 s“^, p = 959 kg/m^, and A = 
27r/K = 0.0254 m. We express the strength of the forcing 
in terms of a non-dimensional parameter, the Reynolds 

number Re = 

Equation ([^ is solved numerically by using a pseudo- 
spectral method m. assuming periodic boundary condi¬ 
tions in both X and y directions, i.e., uj{x,y) = uj{x + 
Lx,y) = uj{x,y + Ly), where = 0.085 m and Ly = = 

0.1016 m are the dimensions of the domain in the x and y 
directions, respectively. 

It is important to note that Equation ([^ , with periodic 
boundary conditions, is invariant under any combination 
of three distinct coordinate transformations: (I) a trans¬ 
lation along x: Tsxix,y) = (cc -I- Sx,y), Sx G [0, L^,]; (2) 
a rotation by tt: TZ(x,y) = {—x,—y); and (3) a reflection 
and a shift: 'D{x,y) = {—x,y -I- A/2). Because of these 
symmetries, each particular solution to Equation (§ gen¬ 
erates a set of solutions which are dynamically equivalent. 
Physically, invariance under continuous translation leads 
to the existence of relative equilibria (REQ) and relative 
periodic orbit (RPO) solutions, in addition to equilibria 
(EQ) and periodic orbit (PO) solutions. 

For Re = 25.43, the flow is characterized by a steady 
RPO; Figure [^a) shows a projection, plotted using the 
three dominant Fourier modes of this RPO. The RPO has 
a period 34.78 seconds and a drift speed 1.354 x 10“® m/s. 
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Figure 1: (a) Three-dimensional projections of a stable RPO at Re = 
25.43 from the Kolmogorov flow using the imaginary part of the three 
dominant Fourier modes, /i, 72, and I 3 . The gray line indicates the 
evolution of a RPO; three snapshots sampled from that orbit are 
indicated by a red diamond, a red circle, and a red square, which are 
analyzed below, (b) Three-dimensional projections of a turbulent 
trajectory, at Re = 26.43, using the real parts of the three dominant 
Fourier modes, 7?i, R 2 , and R^. The gray line indicates the chaotic 
evolution of the flow, which is influenced by the presence of unstable 
flxed points, indicated by red circles, which are also analyzed below. 


The tunnel-like structure is a result of the periodic mo- 
tion superposed over the slow drift along the x-direction. 
For larger forcing {Re = 26.43), the flow becomes weakly 
turbulent, as can be seen from the Fourier projections 
in Figure [^b). The turbulent dynamics in this regime 
are of great interest as the flow explores a region of the 
state space which contains “weakly” unstable EQ, PO, 
REQ, and RPO solutions. Recent theoretical advances 
have shown that the identification of these solutions could 
aid the understanding of weakly turbulent dynamics El- 
For instance, if the turbulent trajectory is close to an EQ 
solution (uJo), which is characterized by dujo/dt = 0, we 
would expect the instantaneous rate of change of lu to be 
relatively small, i.e., dojjdt « 0. Similarly, a close pass to 
a PO solution would mean uj{t + T) « uj{t), where T is 
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Figure 2: (a) A snapshot of the ^-component of the vorticity fleld 
(jj for Kolmogorov flow from the stable relative periodic orbit found 
at Re = 25.43. (b) A snapshot of the renormalized 8-bit mid-plane 
temperature fleld T* for Rayleigh-Benard convection from the stable 
almost-periodic orbit found at Ra = 3000 and Pr = 1. 



the period of the PO that is guiding the dynamics of the 
turbulent trajectory. The turbulent trajectory depicted in 
Figure[^b) passes close to unstable EQ and REQ solutions 
which are indicated by the red dots. 

2.2. Rayleigh-Benard Convection 

Rayleigh-Benard convection is a canonical pattern form¬ 
ing system that has been used to gain many new fun¬ 
damental insights into the spatiotemporal dynamics of 
systems that are driven far-from-equilibrium [m IS]. 
Rayleigh-Benard convection is the buoyancy driven fluid 
flow that occurs when a shallow layer of fluid is heated uni¬ 
formly from below in a gravitational field. The dynamics 
are governed by the Boussinesq equations. 


9u \ 

= 

— Vp -1- V^u -f RaTz, 

( 3 ) 

BT 


V^T, 

( 4 ) 

V • u = 

0 , 

( 5 ) 


where u(a:, y, z, t) is a vector field of the fluid velocity, 
p{x, y, z, t) is the pressure field, and T{x, y, z, t) is the tem¬ 
perature field. In our notation, the origin of the Cartesian 
coordinates (x, y, z) at the center of the domain are at the 
lower heated plate where z is a unit vector opposing grav¬ 
ity. Equations (i-® represent the conservation of mo¬ 
mentum, energy, and mass, respectively. The equations 
have been nondimensionalized using the vertical diffusion 
time of heat as the time scale, the layer depth as the length 
scale, and the constant temperature difference between the 
lower and upper plates as the temperature scale. 

In our work, we consider Rayleigh-Benard convection 
in a shallow domain with a cylindrical cross-section. The 
no-slip fluid boundary condition u = 0 is applied to all 
material surfaces. The lower and upper plates are held at 
a constant temperature where T(z = 0) = 1 and T(z = 
1) = 0, respectively. The lateral sidewalls of the cylindrical 
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container are assumed to be perfectly conducting, which 
yields T{z) = 1 — z. 

The dynamics can be described using three non- 
dimensional parameters. The Rayleigh number Ra repre¬ 
sents the ratio of buoyancy to viscous forces. At the crit¬ 
ical value Rgc = 1708, an infinite layer of fluid undergoes 
a bifurcation to straight and parallel convection rolls. For 
increasing values of the Rayleigh number Ra > i?Oc, the 
dynamics become periodic, chaotic, and eventually turbu¬ 
lent. The Prandtl number Pr is the ratio of the momentum 
and thermal diffusivities. For typical experiments using 
compressed gasses, Pr « 1. Lastly, the aspect ratio of the 
cylindrical domain F is the ratio of the domain’s radius to 
its depth. 

We numerically integrate Eqs. (§-(© using a highly par¬ 
allel spectral element algorithm that has been tailored for 
the study of convection (c.f. m)- Figure [gb) shows a 
typical pattern from a numerical simulation of Rayleigh- 
Benard convection. In this simulation, Ra = 3000, Pr = 
1, and the aspect ratio of the domain is F = 10. The nu¬ 
merical simulation is initiated from a field of small random 
perturbations to the temperature field and is integrated for 
long times. Figure [^b) illustrates the fluid temperature 
field at the horizontal mid-plane (z = 1/2), where light 
is warm rising fluid and dark is cool falling fluid. This 
image is a snap shot in time of a time-dependent pattern 
where the dynamics are nearly periodic in time. The pat¬ 
tern shown does not include the region near the sidewall. 
Specifically, a distance of one-layer depth from the lateral 
sidewall is not shown (this distance is approximately the 
width of a convection roll). This is done to remove the 
complex fluid flow that occurs in the small region near the 
sidewalls to allow our diagnostics to focus upon the bulk 
patterns and dynamics (c.f. [TO]'). 

3. Persistent Homology 

The aim of this paper is to introduce an approach for 
analyzing the dynamics of the pattern evolution in spa- 
tiotemporal systems. This is done in two steps. First, 
we perform nonlinear data reduction, and then we extract 
information about the dynamical structures from this re¬ 
duced data. We formulate both of these tasks in terms 
of analyzing the structure of the sub-level sets of a scalar 
function / : D —>■ K, where Z? is a topological space. Tools 
from algebraic topology, homology in particular, are used 
to capture and quantify the geometry of the sub-level sets. 

Recall that given any topological space .Z, homology as¬ 
signs to Z a sequence of vector spaces P[k{Z), /c = 0,1,.... 
The dimension of Plk{Z) is called the fc-th Betti number 
and is denoted by /^^(Z). Betti numbers provide geometric 
information about A: f3o{Z) is the number of connected 
components, or pieces, of Z; Pi{Z) indicates the number of 
loops or tunnels in Z; and P 2 {Z) is the number of cavities. 

Our goal is to understand structure of the sub-level sets 

( 6 ) 


for all values of 6* G M. As we vary 6, the number of com¬ 
ponents, loops, and cavities in C(/, 9) changes, implying 
that /3fc(C(/,d)), k = 0,1,2, also changes. (See Section]^ 
for examples.) What is remarkable is that, under very 
weak conditions, we can choose bases for the vector spaces 
Z7fe(C(/, 0)) over all values of 6 such that, given a basis 
element of we can identify a unique value 0t, 

at which this basis element appears and a unique value 
9d at which this basis element disappears. We refer to 
Of) as the birth value^ 9d as the death value, and the pair 
(0b, 9d) € as a persistence point corresponding to the 
chosen basis element of i7fe(C(/, 9)). The difference 9d — 9i^ 
is called the life span of the persistence point. Longer life 
spans are associated with geometric features that persist 
through larger variations of 9, and persistence diagrams 
are a codification of this information. Given a scalar field 
/, the set of associated persistence diagrams are denoted 
by PD(/) = {PDfc(/)}, where PDfe(/) consists of all per¬ 
sistence points corresponding to the fc-th level of homology 
(keeping track of multiple copies of a single point), along 
with inhnitely many points at each point along the diago¬ 
nal 9b = 9d. The reason for the inclusion of the diagonal 
is made clear in Definition |5.1[ when we define metrics on 
the space of persistence diagrams. 

For the systems introduced in Section we first use 
persistent homology as a nonlinear data reduction method. 
For Kolmogorov flow we study the scalar field oj: D —>■ 'R, 
the z-component of the vorticity field, while for Rayleigh- 
Benard convection we study the scalar field T: Z? —>■ K, 
the temperature field at the mid-plane. It is important 
to note that the domains for these two scalar fields are 
different. For Kolmogorov flow, the domain ZZ is a torus 
since we are using periodic boundary conditions, while for 
Rayleigh-Benard convection, ZZ is a disk. For the disk, 
we need only to concern ourselves with the vector spaces 
Hk{C{uj, 9)) for fc = 0,1. However, for the torus, the vector 
spaces H2{C{T,9)) also need to be considered, since the 
torus encloses a three-dimensional cavity. In section 
we explain how the persistence diagrams PD(/) capture 
important information about the patterns given by the 
scalar fields w and T. 

The set of all persistence diagrams PD is a metric space, 
denoted by Per (see Section]^. Since we are studying the 
evolution of Kolmogorov flow and Rayleigh-Benard con¬ 
vection, we have time series of the vorticity {coi} and tem¬ 
perature {Ti} fields, and, therefore, we have time series 
of persistence diagrams {PD(wi)} and {PD(ri)}. We view 
each of these time series as a point cloud X C Per. To 
extract information about dynamical structures present 
in the time series, we use persistent homology a second 
time to quantify the geometry associated with this point 
cloud. This is achieved by introducing a new scalar func¬ 
tion /; Per —[0, oo) that gives the distance from any point 
in Per to the point cloud X and is defined by 

f{x) := d{x,X) := min d{x,Xi), (7) 

Xi^X 


C{f,9) = {x€D\f{x)<9), 
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where d is an appropriate metric on the space of persis¬ 
tence diagrams. The associated sub-level sets C{f,6) are 
once again given by Q. 

To carry out the steps mentioned above requires the 
ability to compute the persistence diagrams PD(/). To 
do this, we need to calculate iJfc(C(/, 0)), which requires 
a discrete representation of C(/, 0) called a complex. In 
the context of nonlinear data reduction, we make use of 
a cubical complex. When analyzing the geometry of the 
point cloud, we approximate C(/, 9) using a Vietoris-Rips 
complex, which is a special form of a simplicial complex. 
This is a classical subject and thus there are a variety of 
references providing precise definitions of complexes, e.g. 
[7] for Vietoris-Rips complexes and [IB] for cubical com¬ 
plexes, discussions of issues related to approximations |3], 
and how one proceeds from a complex to computing per¬ 
sistent homology [SlIZ]. The homological computations 
in this paper were performed using the Perseus software 

m- 

The numerical data for the vorticity and the tempera¬ 
ture fields is presented in the form of piecewise-constant 
functions defined on a rectangular lattice. For Kolmogorov 
flow, values of ui are reported in double precision. Recall 
that the vector spaces Hk{C{uj,6)) can only change for 
9 G Q, where 0 is the finite set of values that uj attains 
on the given lattice. Each of the sets C{uj, 9) is a cubical 
complex, and we use the Perseus software to compute the 
corresponding persistence diagrams using only the values 
0 G 0. Numerical simulations for Rayleigh-Benard convec¬ 
tion are carried out with high precision as well. However, 
keeping in mind our goal to compare the numerical simula¬ 
tions with experimental data, we convert the temperature 
held to an 8-bit temperature held T* (an integer-valued 
function with values between 0 and 255), which can be 
obtained experimentally. Consequences of this rescaling 
are examined in Section [6l 


co<-1.5 



(a) 


CO < 0 



(b) 


CO <0.75 



(c) 


CO < 1.5 



(d) 



(e) 


(f) 


4. Interpreting Persistence Diagrams 

The purpose of this section is to provide intuition and in¬ 
terpretation of the information that persistence diagrams 
present. As indicated in the previous section, we are in¬ 
terested in the diagrams PDfc(a;), /c = 0,1, 2, of the vortic¬ 
ity held for Kolmogorov how, and the diagrams PDfc(T*), 
fc = 0,1, of the temperature held for Rayleigh-Benard con¬ 
vection, shown in Figure]^ 

We begin by discussing PDo(w), shown in Figure [^e), 
computed from a single time snapshot of the vorticity held 
for the Kolmogorov how. The minimum value of the vor¬ 
ticity held is —2.7206, and therefore, C(uj,9) = 0 for all 
9 < —2.7206. At 0 = —2.7206, two components appear, 
indicated by the two persistence points with birth value 
0b = —2.7206. The death value of one of these two persis¬ 
tence points is 0d = —0.697, and so the two components 
merge at this value, resulting in a single component. This 
explains the persistence point (—2.7206, —0.697). The rea¬ 
son the other persistence point is denoted by (—2.7206, oo), 


Figure 3: (a-d) Sub-level sets C^,6) = {x ^ D : lj{x) < 0} of the 
vorticity field, shown in Figure a), for different values of de¬ 
picted in black, (e) PDo(a;) and PDi(aj) persistence diagrams of 
the vorticity field indicate the values of 6 at which the connected 
components and loops appear and disappear (merge together). Ev¬ 
ery point {9i,,6d) in the PDo(u;) (PDi(a;)) persistence diagram cor¬ 
responds to a connected component (loop) that appears in C(a;, 05 ) 
for the first time and is present in every set C(cj,0), for 6 < 0 < d, 
but not in A connected component disappears by merging 

with a previously existing component and a loop disappears when 
it is filled in. Video 1 of the supplementary materials provides an 
animation. 


with 9d = 00 , is because when features merge, a choice 
must be made about which topological feature (in this 
case, a connected component) dies. Having a consistent 
choice of basis over all values of 0 requires that the homol¬ 
ogy generator associated with the geometric feature that 
has the larger birth value die first. If the birth values are 
the same, then it does not matter which topological feature 
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T*<25 



(a) 


T* < 100 



(b) 



Figure 4: (a-d) Sub-level sets C(T*, 9) = {x ^ D ■. T*(a^) < 9} of the 
the renormalized 8-bit temperature field T*, shown in Figure j^b), 
for different values of 0, depicted in black. As before the persistence 
diagrams (e) PDo(i.<^) and (f) PDi(ct;) indicate the values of 9 at 
which the connected components and loops appear and disappear 
(merge together). Video 2 of the supplementary materials provides 
an animation. 


with this birth value is chosen to be the one that persists. 
In particular, this implies that the generator associated 
with one of these two initial components can never die. 

Figure [^a) indicates the subset of D corresponding to 
C(w, —1.5). We remind the reader that the domain D for 
Kolmogorov flow is a torus, since the left (top) and right 
(bottom) boundaries are identihed. Therefore, C(w, —1.5) 
consists of eight distinct connected components instead of 
nine. 

The existence of these eight connected components can 
also be extracted from PDo(a;), shown in Figure |^e). Ob¬ 


serve that these connected components correspond to con¬ 
nected regions with birth value 9b < —1.5 and death 
value 9d > —1.5. In Figure |^e), this corresponds 
to the eight points in the rectangular region R- 1.5 := 
{i9b,9d) €M^\9b< -1.5 and 9d > -1.5}. 

Figure l^b) indicates that C(a;,0) consists of four con¬ 
nected horizontal bands, which agrees with the num¬ 
ber of persistence points in the rectangular region 
Ro = {(^f),fi*d) € I 0b < 0 and 9d > O} of PDo(u;). Each 
stripe is created as two distinct components present in Fig¬ 
ure l^a) grow and merge, causing one component to die 
each time. The deaths of these components are captured 
by the points in the rectangle which are not in the 

rectangle Ro, since these are components that are born 
before 9 = —1.5 but die before 0 = 0. 

Three horizontal stripes merge together before 0 = 0.75, 
as indicated by two points inside the rectangle R- 1.5 that 
are not in the rectangle i?o. 75 - The two remaining con¬ 
nected components merge together soon thereafter, and 
for all greater threshold values, there is only one connected 
component. 

To finish our analysis of PDo(a;), we turn our attention 
to the persistence points close to the diagonal. These have 
very short life spans, which suggests that these features 
may be numerical artifacts. In our example, these points 
represent the narrow horizontal bands formed in between 
two connected components before they merge into a single 
band (see video 1 available in the supplementary materi¬ 
als). These narrow bands are formed by small oscillations 
of the vorticity field at the places where the field is almost 
constant. 

We now turn our attention to the PDi(a;) persistence di¬ 
agram, which characterizes loops in C(a;,0). Appendix |11| 
provides a detailed discussion of independent loops on a 
torus. From PDi(a;), we see that the first loop appears at 
threshold 0 = —0.963. It corresponds to one of the four 
horizontal bands shown in Figure [^b). Each horizontal 
band generates a single independent loop, corroborated 
by the existence of four persistence points in the rectangle 
Ro of PDi(a;). 

We note that the full torus has two loops captured by 
homology. This is expressed in PDi(a;) by the two per¬ 
sistence points with 9d = 00 . Observe that the first loop 
that appears at 0 = —0.963 is equivalent to one of the 
toral loops, thus it cannot be killed by any other loop, and 
hence is captured by the persistence point (—0.963, 00 ). 
The other three loops present at 0 = 0 correspond to the 
same toral loop and thus must die. In fact, they do so by 
0 = 2.5. Note that the birth values 0{, of these persistence 
points are close to the death values 0^ of the persistence 
points in \ Ro of PDo(w). This implies that shortly 

after the components merge, they form horizontal bands 
across the entire domain. 

New loops are also created as the bands start merging. 
If two horizontal bands are connected by n links, then the 
number of loops generated by this object (two bands plus 
the links) is (l-|-n). Thus, the first additional loop appears 
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when a second link is created (see Appendix 11). In our 
example, this happens near the threshold 0.75. 


In Figure [^c), there are four distinct links between the 
two horizontal bands at the top of the figure. The small 
punctures visible in Figure [^c) are filled in almost imme¬ 
diately, and the four links merge into two distinct links. 
The points in PDi(a;) that are close to the diagonal cap¬ 
ture this behavior. The other two links are present for 
a wider range of thresholds, and the loop they generate 
is represented by one of the persistence points in PDi(a;) 
with birth coordinate slightly smaller than 0.75. The hor¬ 
izontal band at the top and the horizontal band at the 
bottom are linked in a similar fashion. This explains the 
presence of another point with birth coordinate slightly 
smaller than 0.75. 


At 9 = 0.932, a connection from the top to the bottom 
boundary is created. This loop is homologically equivalent 
to the second of the two independent loops of the torus, 
and hence is identihed by the persistence point (0.932, oo). 
As the threshold passes the value 1.988, the punctures 
shown in Figure [^d) start disappearing and the corre¬ 
sponding loops start dying. Again, there are \+n indepen¬ 
dent loops for n > 0 punctures. Since the maximum value 
of w is 2.7092, the sub-level set is the whole torus for any 
threshold above this, i.e. C(a;, 9) = D for all 9 > 2.7092. In 
this case, there are no more punctures, and the rectangle 
7?2.7092 contains only two persistence points. 

Finally, we address the PD 2 (a;) persistence diagram, not 
shown for brevity. This diagram contains a single per¬ 
sistence point at {9b,oo) = (3.01,oo). The birth coordi¬ 
nate, 9b, corresponds to the minimum value of 9 for which 
C{ijj, 9) = D, the whole torus. Since C(w, 9) = D for all 
9 > 9b, this point never dies. 

We now discuss the persistence diagrams for the tem¬ 
perature field T* shown in Figure for Rayleigh-Benard 
convection. Again, beginning with PDo(T*), the points 
with short life spans correspond the large number of small 
connected components that make up C(T*,25), as shown 
in Figure l^a). Points with long life spans represent the 
well-defined connected components shown in Figure [^b). 
From the persistence diagram, we can see that these com¬ 
ponents merge almost simultaneously at two threshold val¬ 
ues, 9 « 210 and 9 « 225. 

Turning to PDi(T*), we note that the domain of the 
temperature field is a disk, so the independent loops cor¬ 
respond to punctures inside of the disk. The diagram 
PDi(T*) indicates that there are no loops with long life 
spans, and the loops that do appear do so roughly at the 
same threshold values at which the dominant components 
merge. These features are due to the small fluctuations of 
the temperature field close to the critical values at which 
different rolls merge together. This is consistent with their 
short life spans. 


5. The Space of Persistence Diagrams 

As explained in the previous section, a persistence dia¬ 
gram codifies, in a reasonably compact form, considerable 
information about the geometry of a scalar function. As 
suggested by the examples, we use persistence diagrams to 
provide a reduced description of the state of the dynamical 
system of interest at any given point in time. Therefore, 
to analyze the dynamics, we need to be able to compare 
one collection of persistence diagrams PD (corresponding 
to a snapshot of the flow pattern at an instant of time) 
to another collection of diagrams PD' (from another flow 
snapshot). There are a variety of metrics that can be im¬ 
posed on persistence diagrams [211 |22l |23l [24]. The metrics 
used in this paper rely on pairing the points p G PDj, in 
a one-to-one correspondence (bijection) with the points in 
PDj.. According to the definition, every persistence dia¬ 
gram contains an infinite number of copies of the diago¬ 
nal. Hence, there are many different bijections 7 between 
PDfe and PDj.. Roughly speaking, the distance between 
PD and PD' is defined using the bijections that “minimize 
the shift” in the mapping of the points p from PD^ to j{p) 
in PD'j.. This notion is made more precise in the following 
definition. 

Definition 5.1. Let PD = {PD^} and PD' = {PD(.} be 
two collections of persistence diagrams. The bottleneck 
distance between PD and PD' is defined to be 


dB(PD, PD') = max inf sup |b- 7 (p)lloo, ( 8 ) 
k 7: 

where || (oq, 60 ) - (oi, 5i)||oo := max {|ao - oi |, | 6 o - 611 } 
and 7 ranges over all bijections between persistence points. 
Similarly, the degree-p Wasserstein distance is defined as 


drvp(PD,PD') 


i/p 


inf 

^7: PD;,^PD'j^ 
k “ 


\\p-'y(p)\\^ 

P&PDk 


( 9 ) 


Roughly speaking, a function /: D —^ M is tame if, for 
every 6 * G M, the vector space Hk{f~^{{—oo,9])) is finite 
dimensional for every k, and there are only finitely-many 
thresholds at which the vector spaces change (for a pre¬ 
cise definition see |7])- For our purposes, it suffices to re¬ 
mark that if / is a piecewise-constant function on a finite 
complex, then / is tame. In particular, the numerically- 
computed vorticity field uj and 8 -bit temperature field T* 
are tame functions. 

For the remainder of this paper, we use Per^ to de¬ 
note the set of persistence diagrams corresponding to 77^ 
and Per to denote the set of all persistence diagrams. 
Let T{D,M.) denote the set of tame functions f:D^ 
K equipped with the L°° norm. A fundamental result 
|7] is that, using the Wasserstein or bottleneck metrics, 
PD: T{D,M.) —)■ Per is a Lipschitz-continuous function. In 
particular, if /, 5 G T{D,M.), then 


dB{PD{f), PD( 5 )) < sup \f{x) - g{x)\. ( 10 ) 

x£D 
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Birth 


(b) 


Figure 5: (a) A one-dimensional scalar field and its piece- 

wise constant approximation /. The dashed line shows a tame ap¬ 
proximation of / whose persistence diagram is identical to PD(/). 
(b) Points in the persistence diagram PD(/) are given by closed sym¬ 
bols and the points in PD(/) are represented by open symbols. Two 
points on the top have infinite death coordinate. Lines connecting 
the points represent matching of the persistence points for which the 
bott leneck distance dg(PD(/), PD(/)) is realized. It follows from 
that ds(PD{/), PD(/)) < sup^g^, \f{x) - f{x)\. 


These results on Lipschitz continuity have two impor¬ 
tant implications for this work, both stemming from the 
fact that our analysis is based on numerical simulations. 
Assume for the moment that f : D —>■ 'R denotes the ex¬ 
act solution at a given time to either Kolmogorov flow 
or the Boussinesq equations. Ideally, we want to under¬ 
stand PD(/). Our computations of persistent homology 
are based on C(/, 0^), a cubical complex defined in terms of 
the numerically-reported values 0^, where / represents the 
associated piecewise-constant function. If the numerical 
approximation / satisfies sup^jg^, \f{x) — f{x)\ < e, then by 


(101 we have a bound on the bottleneck distance between 


the actual persistence diagram PD(/) and the computed 
persistence diagram PD(/), so that dB(PD(/), PD(/)) < £. 
Figure [^provides a schematic justification of this claim. 

As indicated in the introduction, persistent homology 
is invariant under certain continuous deformations of the 
domain. To be more precise, ii g: D ^ D is a homeomor- 
phism and /: —>■ K, then PD{fog) = PD(/). Of partic¬ 
ular relevance to this paper is a function g which arises as 
a symmetric action on the domain. In this paper, we work 
with piecewise-constant numerical approximations of the 
actual functions of interest, and we cannot assume that 
this equality holds. However, if / is given and f = f o g, 
where g is as above, and we have an L°° bound e on the dif¬ 
ference between the approximation and the true function. 


then by (10) 


dB{D{f),Dif'))<2e. (11) 


In summary, under the assumption of bounded noise or 
errors from numerical simulations (or experimental data), 
we have explicit control of the errors of the distances in 

Per. 



ds 

dpp2 

dpyi 

(PD“,PD'’) 

0.01 

0.049 

0.497 

(PD“,PD'’) 

0.864 

2.648 

12.35 

ratio (pQa pQb) 

86.4 

54.05 

24.85 


Table 1: Distances between selected persistence diagrams (rounded 
to 3 decimal places) shown in Figure]^ corresponding to the vorticity 
fields given by Figure]^ 


6. Using Metrics in the Space of Persistence Dia¬ 
grams 

The goal of this section is twofold: one, to provide in¬ 
tuition about the information contained in the different 
metrics, and two, to suggest how viewing a time series in 
Per can provide insight into the underlying dynamics. 

We begin by remarking that the bottleneck distance ds 
measures only the single largest difference between the 
persistence diagrams and ignores the rest. The Wasser- 
stein distance dwp includes all differences between the di¬ 
agrams. Thus, it is always true that 


rfs < d\Yp ■ (12) 

The sensitivity of the Wasserstein metric to small differ¬ 
ences (possibly due to noise) can be modulated by the 
choice of the value of p, i.e. ii p > q, then one expects dwp 
to be less sensitive to small changes than dwi- In this 
paper, we restrict ourselves to the bottleneck distance ds 
and the Wasserstein distances dwp for p = 1,2. 

The most obvious use of these metrics is to identify or 
distinguish patterns. As an example, we consider patterns 
along an orbit from the Kolmogorov flow. As indicated in 
Section |2.1[ this particular trajectory arises from a peri¬ 
odic orbit with a slow drift along an orbit of continuous 
symmetry. In particular, we consider the three time points 
indicated in Figure [^a): two that appear to differ by the 
continuous symmetry, and a third that lies on the ‘oppo¬ 
site’ side of the periodic orbit. Plots of the associated vor¬ 
ticity fields at these points (see Figure agree with this 
characterization of the time points. We want to identify 
this information through the associated persistence dia¬ 
grams PD“, PD**, and PD°, shown in Figure[^ Indeed, the 
plots of PDj) and PD^. are difficult to distinguish, but PD^ 
is clearly distinct. To quantify this difference, we make use 
of the distances between the persistence diagrams using 
ds: dw^, and dw^ ■ These values are recorded in Table 
Not surprisingly, the distances between PD^* and PD^ are 
much smaller than the distances between PD“ and PD'’. 
We want to use these distances, as opposed to the detailed 
information in the persistence diagrams, to obtain rough 
information about how the pattern at Figure |^a) differs 
from the pattern at Figure |^c). 

The patterns shown in Figure [^a)-(b) differ by a sym¬ 
metric transformation, and so (iB(PD“, PD*”) can be inter¬ 
preted as a lower bound on the numerical errors. Now 
observe that either PD“ or PD" must have a persistence 
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(a) (b) (c) 


Figure 6: Three snapshots of the vorticity fields uj from the stable 
relative periodic orbit of the Kolmogorov flow, found at Re = 25.43. 
The vorticity fields correspond to the (a) diamond, (b) square, and 
(c) circle in Figure Qa). The persistence diagrams for these three 
snapshots are generated and compared in Figure]^ Differences be¬ 
tween the persistence diagrams are qualitatively measured by the 
distances shown in Table [T] 



Figure 8: (a) Contributions to the (PD'^.PD*^) distance 

for Rayleigh-Benard convection. (b) Contributions to the 
(PD^.PD"^) distance for the Kolmogorov flow. 


PD 


0 


PD 


1 




Figure 7; PDq persistence diagrams PD“, PD^ and PD'^ for the vor¬ 
ticity fields shown in Figure ^ The points in PD“ and PD*" are 
almost identical because the corresponding vorticity fields are sim¬ 
ilar. The points in PD'^ are more spread out and do not shadow 
the points in PD“ so well. The same is true for the PDi persistence 
diagrams which are not shown. So d*(PD“^D^) < d*(PD“, PD'^), 
for * S {S, W^, as indicated by Table 

point with life span greater than dB(PD“,PD'^) (oth¬ 
erwise pairing the persistence points with the diagonal 
will produce a smaller ds distance). Since the ratio 
dB(PD“, PD'^)/(iB(PD“, PD*”) is 86.4, we know that there 
is a significant distinction between differences that should 
be ascribed to error and differences that can be ascribed 
to significant geometric features. 

For some applications, there might be only two differ¬ 
ent scales at which the geometric features evolve: one scale 
corresponding to the signal, and the other representing the 
noise fluctuations. If that is the case, then there are only 
two types of changes. If we suppose that the large changes 
are comparable to PD°) and the noise fluctuations 

are of the order (iB(PD“, PD^), then we can approximate 
the distances (PD“, PD^) and (PD“, PD'^) as fol¬ 
lows: 

dvvi(PD“, PD^) « n ■ dB(PD“, PD^) -P k ■ dB(PD“, PD*'), 

(13) 

dw2{PD\PD^) « 

/- (14) 

^n(dB(PD“, PD^))2 + fc(dB(PD“, PD*'))2, 


where n is the number of features that change significantly 
and k is the number of features that change very little. 
We recall that dB(PD“,PD*’) < dB(PD“,PD^) (XableQ. 
Hence, the significant contributions to the d \^2 metric are 
of the order of dB(PD“, PD°), and the small changes do not 
significantly contribute to dw^ ■ This leads to the following 
approximation: 


(PD“, PD“) « V«(f^B(PD“,PD"))2. (15) 


By solving (13) and (^15|, we obtain n = 9 and k = 383, 


and so the number of features that change significantly is 
bounded from below by 9. Before we discuss the values 
of n and fc, let us repeat the same computation for the 
Rayleigh-Benard convection. 

We recall that the 8-bit temperature field T* is an 
integer-valued function with values between 0 and 255. For 
integer-valued functions, the smallest nonzero ds distance 
between distinct frames is e = 0.5. We use this number as 
the lower bound on the numerical errors. The snapshots A 
and C, not shown for brevity, are from a single orbit, and 
they realize the maximal distance between two snapshots 
(exact distances are given by Table [^. Solving Equations 
((l3|) and (15) yields n = 35 and k = —4400. These num¬ 


bers obviously do not make sense. Therefore, the changes 
cannot be divided into two distinct groups, as we assumed 
above. Figure |^a) shows that there is only one change 
on the order of (iB(PD“, PD'^). All the other changes are 
at least an order of magnitude smaller. More precisely, 
there are 77 changes that are between one to two orders 
of magnitude smaller than dB(PD“, PD"^). Moreover, these 
changes are at least an order of magnitude larger than our 
error estimate e = 0.5, and there are also 105 changes of 
the size 2e. Due to the significant number of contributions 
at all orders of magnitude between the noise estimate and 
dB(PD“, PD'^), we cannot assume that there are predomi¬ 
nantly two distinct types of changes: one corresponding to 
the noise and the other to the signal. Thus, the approx¬ 


imation (15) of dyy^ is not valid in this setting because a 
large part of (7^2 (PD“, PD'^) comes from contributions at 
the intermediate scales between e and dB(PD“, PD*'). 

We now return to the values of n and k for the 
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Figure 9: (a) Distances d* between the consecutive sample points along the stable relative periodic orbit of the Kolmogorov flow are shown 
for just over three periods. Distances are normalized by their maximum value which is 0.0654 for 0.2266 for and 1.9143 for 

dy/i ■ Distance between the consecutive sample points can be interpreted as the speed at which the orbit moves in the space of persistence 
diagrams. Speed is not uniform along the orbit. Instead, there are parts of the orbit where the dynamics are slow, separated by relatively 
fast evolution, (b) Distances d* between the consecutive sample points along an almost-periodic orbit from Rayleigh-Benard convection are 
shown for approximately 2 periods. Distances are again normalized by their maximum value which is 83.5 for ds, 113.66 for d^ 2 , and 383 
for dy^ri. 



ds 

d\Y2 

d]^i 

(PD^,PD'"') 

81.5 

480.9 

650.5 


Table 2: Distances between selected persistence diagrams (rounded 
to 3 decimal places) corresponding to two different 8-bit temperature 
fields obtained from a single orbit of Rayleigh-Benard convection. 


Kolmogorov flow. Figure [^b) shows that there are 
approximately 200 changes of order smaller or equal 
to c?b(PD“, PD^), and 11 dominant changes of order 
dB(PD“, PD'^). Finally, we can identify approximately 
28 changes occurring on intermediate scales, and their 
sizes are at least an order of magnitude smaller than 
dB(PD“, PD'^). In fact, most of them are not much larger 
than dB(PD“,PD^)- Hence, the changes can be roughly 
divided into two classes of different order. The fact that 
the division is not absolutely sharp leads to n = 9, which 
is smaller than the actual number of dominant changes on 
the order of dslPD'*, PD°). 

We now turn to the question of understanding dynamics 
from the time series in Per. Let fi denote the scalar field 
of the system at time ti. If At = — ti is small and the 

evolution of the system is continuous, then because d* (for 
* G {B, } ) is a metric, 

can be interpreted as an average speed in the space of 
persistence diagrams over the time interval The 

value of s* depends on the choice of metric. For exam¬ 
ple, SdB is the rate at which the largest change between 
the geometric features of the scalar fields occurs. The 
speeds measured by dwp, P = 1,2, keep track of the rate 
of change between all geometric features, though to some 
extent, suppresses the effect of noise. 

Figurej^a) shows distances d* between consecutive sam¬ 
ple points, normalized by the maximum distance, of sam- 



Sample point Sample point 


(a) 


(b) 


Figure 10: (a) Distance matrix D, generated by the metric, 

for approximately three periods of the stable relative periodic orbit 
of the Kolmogorov flow. The large black patches correspond to the 
parts of the orbit with slow dynamics. Equally spaced black lines 
parallel to the diagonal suggest periodicity of the orbit with period 
equal to the distance between these lines, (b) Distance matrix D, 
generated by the ds metric, for 2 periods of the almost-periodic orbit 
of Rayleigh-Benard convection. The checkerboard pattern indicates 
that sampling is too sparse, and fast dynamics are not resolved with 
the level of sampling. 


pies taken along approximately three periods of the stable 
relative periodic orbit of the Kolmogorov flow described in 
Section [2.1 1 Normalizing s* by the maximum speed along 
the orbit furnishes the same curves. Each of the graphs 
of s* indicate that speed is not uniform along the orbit; 
there are parts of the orbit where the geometry is changing 
slowly, separated by intervals of relatively fast evolution. 
The evolution is extremely slow around the states 100, 240, 
and 380, where the speeds s* are below the noise (fluctua¬ 
tion) levels given by the first row of Table This suggests 
that the orbit may be passing close to a fixed point. 

While the general shapes of the speed profiles for dif¬ 
ferent distances are similar, there are places where the 
signs of their derivatives differ. As the system starts ac¬ 
celerating around t = 100, all three speeds are increas¬ 
ing. Around t = 130, the speed starts decreasing 
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while the other two speeds are still increasing. Note that 
around t = 130, the speeds rise above the noise level (fluc¬ 
tuations). The fact that sb and are both increasing 
means that the changes between the prominent geomet¬ 
ric features are growing in this region. The speed is 
decreasing in this region and so the noise (error) fluctua¬ 
tions are decreasing. At t = 170, the dominant geometric 
features start to evolve considerably. Changes of the dom¬ 
inant features are the most important contributions to all 
three metrics. Therefore, the derivatives of the speeds s* 
have the same sign again (see video 3, 4, or 5 in the sup¬ 
plementary materials). 

Figure [^b) shows the normalized speed profiles for the 
Rayleigh-Benard convection simulations. As in the case of 
the Kolmogorov flow, all three metrics indicate that there 
are two distinct speed scales along the orbit. However, the 
speed profile for ds differs significantly from those of d^/i 
and particular, it suggests that for significant time 

periods, the major geometric features of the temperature 
field vary only slightly, followed by two rapid bursts of 
change. This can be verified by viewing video 6, 7, or 8 of 
the supplementary materials. Away from the rapid bursts, 
ss is close to 1. The temperature field has integer values, 
so the changes cannot be smaller than 0.5. This implies 
that both swi and sw'^ are dominated by the small fluc¬ 
tuations which are roughly of order 1. Hence, the relative 
speeds and have essentially the same shape. 

The plots of s* hint at the underlying dynamics being 
that of a periodic orbit. However, it is important to keep 
in mind that Per is an infinite-dimensional space, and thus 
periodicity in the speed of a trajectory does not imply 
that the trajectory lies on a closed curve. As an example. 
Figure l^b) suggests that there are just over four periods 
of Rayleigh-Benard convection shown, and that a single 
period is roughly 125 frames long. However, looking at 
video 6, 7, or 8 (in the supplementary materials), it is clear 
that a full period is closer to 250 frames. Similarly, it is 
not obvious that extended periods of high speed imply that 
the pattern changes significantly over that time period (a 
periodic orbit of small diameter can exhibit high speed). 
This requires a more global geometric analysis of the time 
series, which we discuss shortly. 

With the same data set used to generate Figurej^a) and 
letting ujj denote the vorticity field at time tj, Figure[To|(a) 
exhibits the d^r 2 distance matrix D for Kolmogorov flow, 
with color-coded entries D{i,j) := (PD(wi), PD(tLij)). 
(The ds and d\^i distance matrices look very similar and 
are not shown.) Observe that D{i,i) = 0 and D is sym¬ 
metric since D{i,j) = D{j,i). Furthermore, Figure|^a) is 
a plot of the immediate off-diagonal entries. A striking fea¬ 
ture of the distance matrix in Figure [l0] ( a) is the existence 
of dark lines parallel to the diagonal, spaced at intervals of 
roughly 110 samples. This indicates that, in the space of 
persistence diagrams, the trajectory periodically repeats 
the same, or nearly the same, state. Since the diagonals 
are spaced at roughly 110 samples, we can indeed say that 
the orbit revisits very similar states at intervals of roughly 


110 samples. Similarly, the light regions close to the diag¬ 
onal in Figure [To|(a) correspond to the times in Figurej^a) 
at which the speed is large, indicating significant changes 
in the pattern at these times. 

To obtain a more global analysis we turn to Fig¬ 
ure [Mb) that shows the distance matrix D{i,j) := 
d^ 2 (PD(T*), PD(Tj‘)) for the temperature fields T* cor¬ 
responding to the trajectory from Rayleigh-Benard con¬ 
vection. Distances between the consecutive temperature 
fields are shown in Figure [^b). The dark diagonal lines 
are spaced at intervals of roughly 250 samples. Thus, even 
though the Figure [^b) suggests a period of approximately 
125, the orbit does not revisit the same state in the space of 
persistence diagrams every 125 samples, but instead every 
250 samples. 

7. Analyzing a Point Cloud using Persistent Ho¬ 
mology 

The discussion in the previous section suggests that in¬ 
teresting information concerning the dynamics of the ge¬ 
ometry of time-evolving scalar fields can be obtained by 
studying the time series in the space of persistence dia¬ 
grams. Note that each scalar field is represented by a per¬ 
sistence diagram PD(/) and thus corresponds to a point 
in Per. We argue that viewing the time series as a point 
cloud in the space of persistence diagrams and studying its 
geometry provides useful information about the dynamics. 

For a point cloud X C Per and the scalar function 
/: X —>■ [0, oo) given by ([T]) (for any of the metrics 
or d^i), the sub-level set C(/, 9) defined by § is a union 
of balls 

C(/,d)= [j B{PD,e), (17) 

PDex 

where B{PD,9) = {PD^ £ Per | d(PD\PD) < 6*}, and d is 
the appropriate metric. In general, one should expect that 
the sets C(/, 9) are complicated. Therefore, computing 
7d*(C(/, 9)) directly is not practical. Instead, we make use 
of the following complex. 

Definition 7.1. Given a point cloud X = {xq, ..., xat} 
in a metric space with distance function d, the Vietoris- 
Rips complex at scale 9, denoted R(A,d), is the simplicial 
complex defined by the collection of simplicies 

{(x„o, ...,x„J I d(x„,,x„J < 2d, for all i,j G {0,1, 2,..., fc}} . 

Observe that the Vietoris-Rips complex is determined 
by the distance matrix associated with X, and hence, there 
is a finite set of threshold values 0 = {d^} at which the 
complex changes. Thus, given a point cloud A in a metric 
space with metric d, the associated persistence diagrams 
PD(A, d) are determined by the Vietoris-Rips complexes 
R(X, d) for d G 0. 

We emphasize that the only data used to analyze a point 
cloud based on the persistent homology of Vietoris-Rips 
complexes are the pairwise distances between the points 
given by the distance matrix associated with X. 


11 



Figure 11: (a) Distance matrix representing pairwise Euclidian distances dE between the points in (b) a point cloud X. (c-e) The blue shaded 
regions indicate the sub-level sets C{X,0) for 9 = 0,0.1755,0.5135, and 0.816. The points, edges and triangles indicate the Vietoris-Rips 
complexes R(X, 6). (c) For 6 = 0.1775 the set C{f,6) consist of three distinct connected clusters. The same is true for 'R,{X,6). The points 
in each connected component of C{f,6) are connected by edges in 'R,{X,6). (c) The three components remain distinct until 6 = 0.5135, at 
which point two components of C{f,6) merge and an edge connecting the points in the merged components appears in 'R.{X,d). 


7.1. Detecting Clusters 


Since /3o counts components, it is reasonable to use per¬ 
sistent homology as a clustering tool. We demonstrate this 
idea on a point cloud with pairwise distances given by the 
distance matrix shown in Figure [T^ a). A possible config¬ 
uration of the six points in is depicted in Figure [TT|b). 
Using the length scale presented in Figure as an 

indicator of the order of magnitude at which we want to 
declare a separation length for the clusters, there are three 
clusters. We now focus on the geometric information con¬ 


veyed by PDo(A, d^;), shown in Figure 12 

Observe that C(/, 0) = R(Ar, 0) consists of 6 vertices. As 
6 increases, the distinct connected components of C(/, 9) 
(as defined in 0) start merging together. In fact, when 
the balls B{xi,9) and B{xj,9) merge together, an edge 
{xi,Xj) appears in R(A, 0). Therefore, Ho{C{f,6)) = 
Ho{R{X,9)) for all 0 S K, and PDo(/) = PDo(A, d^;). 
Note that it is impossible for a new connected compo¬ 
nent to appear for 0 > 0. Hence, all persistence points in 
PDo(A, d_E) have a birth value equal to zero. The death 
coordinates represent the spatial scales at which distinct 
connected components (clusters) merge together. Say that 
we are interested in clusters where the minimal separation 
is on the order of 1 length scale. These clusters corre¬ 
spond to the points in PDo(A, de) with the death coordi¬ 
nate greater than approximately 0.5, and there are three 
persistence points that satisfy this criterion. Thus, we de¬ 
clare that there are three clusters. If the relevant scale for 
separation is of an order of magnitude smaller, then there 
are five clusters, since, in addition to the three points with 
death value greater than 0.5, two points have death values 
slightly larger than 0.05. 

Alternatively, if we are interested in dividing the data 
into two clusters, then PDo(W, d^) can be used to deter¬ 
mine the magnitude of the separation between the clus¬ 
ters. Observe that the persistence point (0,oo) corre¬ 
sponds to the final connected component. The persistence 
point (0,0.816), with the largest finite death coordinate, 
indicates that the components merged at a distance 0.816. 



Birth 

Figure 12: Persistence diagram PDo(X, corresponding to the 
distance matrix in Figure [^a). 


Hence, the minimal distance between points from the point 
cloud X that belong to two distinct clusters is 1.632. 

7.2. Detecting Loops 

Since /3i counts loops, it is reasonable to use persis¬ 
tent homology as a tool for identifying cycles that arise 
from dynamics. Consider any point cloud that generates 
a distance matrix as in Figure [T^a). Again, for the sake 
of intuition, we present in Figure (Hb) an example of a 
point cloud A C with pairwise distances given by the 
distance matrix shown. The persistence diagrams for the 
associated Vietoris-Rips complex filtrations are shown in 
Figure [Ml 

Applying the reasoning from the previous section, we 
can ask whether there is a natural or interesting clustering 
of the data. If, as before, we insist that we are interested 
in clusters where the minimal separation is on the order 
of length scale 1, shown in Figure |l^b), then (0,oo) is 
the only persistence point with death value greater than 
0.5, i.e. at this scale there is only one component. Thus, 
we conclude that from a geometric perspective we may 
treat the point cloud as arising from a single dynamical 
structure. 
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Figure 13: (a) Distance matrix representing pairwise Euclidian distances (Ie between the points in (b) a point cloud X. (c-e) Sets C(/, 0) for 
0 = 0, 0.177, 0.343 and 0.596. The homology of C(/, 6 ) can be approximated by a Vietoris-Rips complex R(X, 9) given by the vertices, edges, 
and triangles shown in (b-e). The first loop in C(/, 0) is created at 0 = 0.177. This loop is due to the noisy sampling and is filled in almost 
immediately. The dominant loop shown in (c) is formed at 0 = 0.343 and persists until 0 = 0.596. 




(a) (b) 

Figure 14: Persistence diagrams (a) PDo{X, cIe) and (b) PDi(X, 
corresponding to the distance matrix in Figure [T^ a). The persis¬ 
tence diagram PDi(X,(I e) contains a dominant point (0.343,0.596) 
corresponding to the robust loop shown in Figure Hid) while the 
point (0.177,0.25) represents the small loop visible in Figure [T^ c). 


We now look for cyclic structures. Observe that 
PDi(V, contains two persistence points. The life span 
of point (0.177,0.250) is 0.06, which is short compared to 
the order 1 length scale, and thus it is reasonable to think 
of this as a result of noise in the data. This is substan¬ 
tiated by Figure [T^c), in which the loop in the Vietoris- 
Rips complex R(V, 0.177) consists of four edges. An addi¬ 
tional edge and two triangles (two 2-simplicies) appear in 
R(A, 0.250), see Figure [I^d). The triangles fill in the loop 
formed by the edges of R(V, 0.177). Two of the four data 
points that are involved in the construction of this loop 
can be viewed as arising from noise or errors associated 
with sampling points from a smooth cycle. 

The life span of persistence point (0.343,0.596) is 0.253 
and suggests that the point cloud is generated by a loop 
with a minimal radius of 0.596, which is on the order of 
the scale of the data. This suggests that the associated 
cycle, indicated in Figure [I^d), represents an observable, 
robust dynamical feature. 


7.3. Application to Systems with Multiple Time Scales and 
Large Data Sets 

Characterizing the geometry of a continuous orbit via 
an approximation by a discrete time series depends on the 
frequency of sampling, and thus becomes a challenge in the 
setting of dynamics with multiple time scales, i.e. when 
the rate of change of the patterns is far from constant. If 
the sampling rate is too slow, then parts of the orbit will be 
poorly (or not at all) sampled. Note that the geometry of 
the continuous trajectory may be more complicated than 
that of a circle; secondary structures might occur if the or¬ 
bit is twisted, pinched, or bent in Per. Thus, the missing 
parts of the orbit could distort (or entirely miss) signifi¬ 
cant features in the geometry of the sampled trajectory as 
compared to the geometry of the underlying (continuous) 
dynamics. Thus, in order to obtain a description of the 
geometry on all relevant spatial scales, including informa¬ 
tion about secondary structures, the sampling rate needs 
to be fast enough. 

To determine if a trajectory has been sampled densely 
enough to resolve the geometry of the underlying dynam¬ 
ics, it is useful to compare the following three values re¬ 
lated to the point cloud in Per: the noise threshold of the 
system, the maximum consecutive distance in the sam¬ 
pled trajectory, and the diameter of the point cloud. Ide¬ 
ally, once a noise threshold has been computed, one would 
like distances between consecutive points from the sam¬ 
pled trajectory to be on the length scale of the noise. 
If sampling faster than this, the features detected from 
the sample that are on the scale of the noise would be 
indistinguishable from artifacts generated from the noise 
in the sample. Thus, ideally, the distance profiles (e.g. 
Figure for Kolmogorov flow and Rayleigh-Benard con¬ 
vection) should have maximums no larger than the noise. 
Unfortunately, this is not practical for reasons that will be 
explained next, and fortunately it is often not necessary. 
For example, the length scale of the computational noise 
could be much smaller than the relevant length scale of in¬ 
terest for studying the geometry of the dynamics. In this 
case, a comparison of the maximum consecutive distance 
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in the sample to the diameter of the point cloud in Per is 
often useful. For instance, if a point cloud has diameter 
100 and the smallest relevant length scale for the geometry 
to be studied is 10, then a maximum consecutive distance 
of 10 is sufficient for the sampling of the time series, even 
if the noise threshold is on length scale 1. Thus, it is the 
interplay of these three numbers that determine if one has 
sampled a continuous time series densely enough. 

Evaluating these three quantities from an initial time 
sample may indicate that an increase in the sample rate 
is required to resolve the dynamics at the relevant spatial 
scale. In the context of a large-scale computation such 
as that required for the 3D simulation of Rayleigh-Benard 
convection, it is easier to save the data at a higher sam¬ 
pling frequency than to develop numerical methods that 
save data based on an adaptive time step. In Section [Tol 
we demonstrate the approach introduced here using ap¬ 
proximately 7 X 10® equally-spaced snapshots of the tem¬ 
perature field of Rayleigh-Benard convection. It should 
be immediately apparent that the set X is too large to 
compute the associated persistence diagrams PD(X, d*), 
for * e directly. The first step would require 

computing the distance matrix for X, which would involve 
49 X 10^° distance computations. Note, however, that us¬ 
ing a fast sampling rate leads to collecting unnecessarily 
many samples at the places where the dynamics are slow. 
This suggests that an appropriate choice of down-sampling 
will allow us to capture the global geometry of the point 
cloud. 

Definition 7.2. Let X be a point cloud in a metric space 
(M,d). Fix d > 0. A set T C X is a 5-dense subsample 
of X if for every x € X, there exits a. y G Y such that 
d{x,y) < 5. 

The following theorem [5^ guarantees that using a <5- 
dense subsample enables us to detect geometric features 
with life span larger than 6 . 




Solution 


(c) 


Figure 15: (a) Pairwise distances between the EQ and REQ 
points in X = {tOn \ n = 1,..., 67} of the Komogorov flow found at 
Re = 26.43 using Newton’s method, (b) Corresponding persistence 
diagram PDo(X, (c) The distance matrix (with values halved 

to match the values in PDo{X, ds)) sorted so that equilibria within 
each of the seven clusters, detected by PDo(-X’,dg), are grouped to¬ 
gether. The seven black blocks on the diagonal correspond to the 
seven clusters, while off-diagonal blocks correspond to values of 6 at 
which the clusters merge. 


which takes advantages of parallel computing structures 
and metric trees. 


8. Distinguishing Equilibria 


Theorem 7.3. Let X be a point cloud in a metric 
space {M,d) and Y a 5-dense subsample of X. Then 
dBiPD{X,d),PD{Y,d)) < 6. 

Remark 7.4. According to the above theorem, there exists 
a bijection between the points in PD(y, d) and PD(X, d) 
such that the distance between matched points is less than 
S. Furthermore, it can be shown that there is a bijection 
7 : PD(Y,d) — ^ PD(X, d) with the following property: if 
l{Sb,Sd) = ^ then 0 < 9^ — 0b < 6 and 

0<9'a-9d<S. 

To optimize the computational cost, we wish to choose 
a subsample of the point cloud Y as small as possible. 
A point cloud Y is 6-sparse if, for every pair of distinct 
points yi, 2/2 G Y, the distance d(2/i, 2 / 2 ) > 5. Given a point 
cloud X and a value d > 0, a d-dense, d-sparse subsample 
Y may always be constructed [5^. Due to the size of 
the point cloud X and the complexity of computing d* 
for * e B,W‘^,W^, we use an alternate algorithm ^5] , 


We now apply the ideas presented in Section |7.1| to the 
problem of clustering symmetry-related equilibria of the 
Kolmogorov flow at Re = 26.43. As discussed in Sec- 
tio n |2.1[ we sample a turbulent trajectory, shown in Fig¬ 
ure (TTb). To obtain the EQ and REQ solutions, we use 
a Newton method. The initial guesses for the Newton 
method are the vorticity fields w that are local minima of 
the norm of dojjdt. In this way, we find a collection X = 
{ujn I n = I,..., 67} of vorticity fields corresponding to EQ 
and REQ of the Kolmogorov flow. These 67 solutions may 
be related to one another through any composition of the 
coordinate transformations listed in Section [O] Hence, it 
is non-trivial to determine how many unique classes of so¬ 
lutions there are and which solutions belong to which class. 
To perform this analysis, we use persistent homology. 

We start by analyzing PDo(X, ds). The pairwise dis¬ 
tances between the points in X are shown in Figure [T^ a). 
As is discussed in Section the distance between per¬ 
sistence diagrams of vorticity fields related by symmetry 
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is small, while persistence diagrams corresponding to the 
vorticity fields that are not symmetry related differ by a 
larger amount. This implies that we can reformulate the 
question of identifying symmetry classes of equilibria as a 
clustering problem. 

The persistence diagram PDo(X, ds), depicted in Fig¬ 
ure (Mb), shows a clear gap between the persistence point 
with death value Od = 0.0285 and the persistence point 
with death value 9d = 0.1215. We interpret this gap as 
separation between the signal and noise (numerical errors). 
Indeed, 0.0285 is just twice the estimate of the lower bound 
on numerical errors for the Kolmogorov flow obtained in 
Section]^ There are 7 points in PDo(X, ds) with death co¬ 
ordinate greater than 0.12, and so we conclude that there 
are seven distinct symmetry classes of solutions. 

Grouping the symmetry related equilibria correspond¬ 
ing to seven different clusters and reordering the distance 
matrix enable us to see how many solutions are in each 
cluster. This is done by thresholding the distance matrix 
so that entries greater than a certain value (in this case 
6 = 0.12) are zeroed out, and then viewing the resulting 
distance matrix as an adjacency matrix, from which it is 
possible to determine connected components. The seven 
black diagonal blocks of the matrix D{i,j) = dsiuji, Wj)/2, 
shown in Figure [Tsj^c), represent pairwise distances be¬ 
tween the symmetry-related solutions in each cluster. The 
inter-cluster distances are given by off diagonal blocks. Re¬ 
ordering the distance matrix and dividing its entries by two 
makes it easier to tie its values to the death coordinates of 
the points in PDo(X, ds). 

The values of the off-diagonal blocks between the first 
three blocks on the diagonal are all roughly 0.2, implying 
that each of these clusters will merge together at approx¬ 
imately 9 « 0.2. This behavior is captured by the per¬ 
sistence points (0,0.20035) and (0,0.208) in PDo(X,ds). 
Recall that the merging of three connected components 
causes the death of just two of them. The next three 
blocks on the diagonal have off-diagonal blocks with values 
at roughly 0.16, and the deaths of two of these underlying 
components correspond to the persistence points (0, 0.163) 
and (0,0.1633). The last diagonal block has a distance 
of roughly 0.12 from the sixth block, and the merging of 
their underlying clusters corresponds to persistence point 
(0,0.1215). Thus, at the cutoff value 9 = 0.21, there are 
two components in the dataset, with one corresponding to 
the first three diagonal blocks and the other the last four. 
The off-diagonal block that relates the second and sixth 
diagonal blocks has a value of roughly 0.28. This distance 
corresponds to the persistence point (0,0.2775), at which 
time the two large components merge together. 

We validate the results of the persistence homology anal¬ 
ysis by performing clustering using the Fourier amplitudes 
as follows. If u}{kx, ky) is the Fourier amplitude of a mode 
(kxyky), then a translation of the pattern in the x or y di¬ 
rections in real space merely adds to the phase of ui{kx, ky), 
leaving the magnitude unchanged. Hence, by comparing 
the amplitudes of the Fourier modes we could group vor¬ 


ticity fields which are related by translations. Since the 
conjugate modes ui{±kx,iky) relate fields which are re¬ 
lated by inversion, to group the vorticity fields which are 
related by a combination of inversion and translation, we 
sum the amplitudes of the conjugate modes. Adding the 
amplitudes of conjugate modes yields a “reduced matrix,” 
which is unique for all the vorticity fields related by the 
coordinate transformations that leave Equation ([^ invari¬ 
ant. This approach also yields 7 distinct classes. 

An analysis of PDo(A',divp)j P = 1,2, yields the same 
results. There are several gaps between the death val¬ 
ues of the points in the persistence diagrams. Again one 
of the gaps starts at roughly twice the value of the esti¬ 
mated lower bound of the noise. However, the separation 
is less pronounced. As discussed in Section the dwp 
metrics capture all the differences between the persistence 
diagrams, and the local numerical errors are summed to¬ 
gether. Thus, a large number of small errors can obscure 
the distinction between the signal and noise. 


9. Stable Periodic Orbit of the Kolmogorov Flow 


In the previous section, we demonstrated the practical¬ 
ity of using persistent homology to cluster equilibria that 
are symmetry-related. In this section, we extend these 
ideas to the setting of recurrent orbits in the context of 
the Kolmogorov flow. 

As is indicated in Figure [^a), the projection of the or¬ 
bit onto the real parts of the three dominant eigenvectors 
suggests a periodic orbit that is undergoing a slow drift in 
the direction of the continuous symmetry. The nature of 
this drift is reinforced by tracking this orbit in the space of 
persistence diagrams; since persistent homology is invari¬ 
ant under the continuous symmetry, this type of drift is 
not present in Per. As a result, we expect the time series 
to lie on a closed loop in Per. This is consistent with the 
information provided by the distance matrix of Figure 
in which the dark lines parallel to the diagonal indicate 
that the distance between persistence diagrams becomes 
very small at regular time intervals. 

For the remainder of this section, we use the ideas of Sec¬ 


tion 7.2 to verify that a circle provides a good description 
of geometry of the point cloud X C Per generated by the 
time series sampled from the Kolmogorov flow. More pre¬ 
cisely, we show that there is a single dominant feature in 
PDo(X,ds) and a single dominant feature in PDi(Ar,d^), 
which agrees with the persistent diagrams for a circle. 

There are two issues that need to be considered: the first 
is the size of the data set, and the second is the spacing be¬ 
tween the data points. As is indicated in Sectionwe use 
the Vietoris-Rips complex to compute persistent homol¬ 
ogy of point clouds. We remark that given N data points, 
the full Vietoris-Rips complex has 2^ cells. Considering 
this, we complete our analysis with the distance matri¬ 
ces corresponding to dB,d-^i, and dw'^ for 500 points, or 
roughly three periods of the Kolmogorov flow. In the next 
section, we introduce techniques for computing persistence 
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Birth Birth 

(a) (b) 

Figure 16: (a) The persistence diagram PDo(-X’, ds) for Kolmogorov 
flow at Re = 26.43. Since all points with finite death coordinates die 
before 0.025, there is only a single dominant point, (b) The persis¬ 
tence diagram PDi(X, dg), showing the single dominant generator 
at (0.0215,0.1559). 

on larger point clouds, which could arise due to increased 
sampling rates, sampling more periods, or both. 

Since we are sampling from a single continuous trajec¬ 
tory, the fact that PDo(V, ds), as shown in Figure [I^ a), 
suggests the existence of a single component does not come 
as a surprise. The persistence diagrams for PDo(V, dvi^p), 
p = 1,2, yield similar results and are not shown. However, 
it is worth noting that this is not a foregone conclusion 
as the location of and spacing between the points of the 
time series are dependent upon the speed along the peri¬ 
odic orbit. As is clear from Figure [^a), the speed of the 
trajectory is not constant. However, it is fairly smooth, 
thus we do not expect extreme differences in the spacings 
between points. 

As discussed in Section pFSl we compare the noise thresh¬ 
old, 9 = 0.01 (Table [^, to the maximum consecutive sam¬ 
ple distance, ds = 0.0654 (Figure |^a) caption), and the 
diameter of the point cloud, 2.64 (FigureflOla)). The max¬ 
imum consecutive sample distance is more than six times 
larger than the length scale of the noise for this system. 
However, the diameter of the point cloud is more than 
forty times larger than the consecutive sample distance. 
Thus, features on the length scale of one fortieth of the 
diameter of the entire point cloud will be resolved, which 
is sufficiently small to consider this an adequate sampling. 
We will return to this issue in the next section. 

As indicated in Figure [T^b), the persistence diagram 
PDi(Ai, ds) clearly detects a single dominant loop along 
which the data is organized. Thus, we conclude that in Per, 
equipped with the metric the point cloud X generated 
by the time series forms a loop with a minimal radius of 
0.1344. Table shows the coordinates of the persistence 
point with the longest life span, its life span, and the sec¬ 
ond longest life span for each of the persistence diagrams 
PDi{X,di,), * e B,W'^,W^. As the table indicates, the 
life span of the dominant point is an order of magnitude 
larger than the next longest life span in each case, and 


PDi 

Dominant 

coordinate 

Max life span 

2 nd life span 

ds 

(0.022,0.156) 

0.134 

0.013 

dyy2 

(0.075,0.405) 

1.366 

0.105 

d\yl 

(0.703,2.069) 

0.330 

0.016 


Table 3: The coordinate of the dominant point in the persistence 
diagram PDi(X, d*) for ★ = S, its life span, and the second 

largest life span. 

so there is a single dominant feature in PDi(Ar, d*). Ad¬ 
ditionally, note that the second longest life spans are as 
small or smaller than the lower bounds on numerical er¬ 
rors indicated by the first row of Table 

10. Almost-Periodic Orbit of Rayleigh-Benard 
Convection 

As mentioned in Section |7.3[ characterizing the geom¬ 
etry of a continuous trajectory becomes a challenge in 
the setting of dynamics with multiple time scales. To 
demonstrate this, we consider the numerical simulation of 
Rayleigh-Benard convection, where from multiple perspec¬ 
tives it appears that the trajectory is close to a periodic 
orbit and that the rate of change in the patterns of the 
temperature field is far from constant. This can be clearly 
seen visually (see video 6, 7, or 8 in the supplementary 
materials). Moreover, both the speed plot. Figure |^b), 
and the distance matrix. Figure HKb): suggest recurrent 
dynamics. However, we note that the rate of change, es¬ 
pecially using the bottleneck distance, is typically small 
except for short periods of time at which the speed spikes. 
The distance matrix has a distinct checkerboard pattern, 
with the edges corresponding to the spikes, again indicat¬ 
ing a rapid and large change in location in the space of 
persistence diagrams. 

The maximum bottleneck distance between the consec¬ 
utive sampling points is 83.5 (Figure [^b) caption), while 
the diameter of the point cloud is only ds = 99.5 (Fig¬ 
ure Therefore, we expect that significant portions 

of the trajectory are missing. Indeed, Figure [l7|(a) shows 
that there are several persistence points in PDo(Ar, db) with 
a (finite) death coordinate larger than ten. Thus, at a 
length scale of 20 (which is forty times larger than the noise 
threshold), the sample of the trajectory is broken into sev¬ 
eral pieces. The largest gap between different pieces of the 
trajectory is 40, as indicated by the persistence point with 
coordinates (0,20). This means that the sampling rate is 
far from adequate. 

The diagram PDi(Ai, ds) in Figure[T7|(b) contains a sin¬ 
gle dominant point at (20, 32.5) with life span 12.5. How¬ 
ever, unlike in our analysis of the Kolmogorov flow in the 
previous section, we cannot argue that this point corre¬ 
sponds to a single dominant loop along which the data is 
organized because of the gaps in the sampling of the orbit. 
As mentioned in Section [773} the missing parts of the or¬ 
bit could introduce loops of similar size corresponding to 
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Figure 17: Persistence diagrams for 500 points of Rayleigh- 

Benard convection, generated from the distance matrix shown in 
Figure [T^b). (a) The diagram PDo(X, ds) shows the appearance 
of persistence points with death values significantly greater than the 
noise threshold, indicating that the sampled trajectory is broken into 
pieces and sampling is not fast enough to resolve the periodic orbit, 
(b) The diagram PDi{X, ds) shows the presence of a loop that is 
born when the pieces of the orbit merge together. 


Figure 18: Persistence diagrams for 70,000 points of Rayleigh-Benard 
convection subsampled with <5 = 4.5, resulting in a point cloud 
Y' C Per with 523 points, (a) Persistence diagram PDo(T^dfl) 
indicating a single dominant component above the noise threshold, 
(b) Persistence diagram PDi(Y', ds) with subsampling error bounds 
shaded in gray. 


secondary structures. These structures might occur due 
to the fact that the loop corresponding to the underly¬ 
ing almost-periodic dynamics might be twisted, pinched, 
or bent in Per. In order to obtain information about sec¬ 
ondary structures, we require a faster sampling rate. 

We increased the sampling rate considerably and col¬ 
lected approximately 7 x 10® equally-spaced snapshots 
of the temperature field over four-and-a-half periods and 
compute the associated persistence diagrams, producing a 
point cloud Y C Per. The maximal distances between the 
consecutive frames for the increased sampling rate drop to 
ds = 4, dvpi = 28, and = 6.52. The new value of 
ds is much closer to our estimate of the numerical error 
and it is more than 24 times smaller than the diameter of 
the point cloud generated from the slower sampling. Since 
the point cloud could only increase in diameter through 
increasing the sample rate, we consider this sampling rate 
to be satisfactory. 

Our next step is to use the ideas introduced in Sec¬ 
tion |7.3| to reduce the size of the sample and to complete 
our analysis. First we construct a d-dense, (5-sparse sub¬ 
sample Y' of the point cloud Y . The smallest value of 
6 for which we were able to compute the persistence dia¬ 
grams PD(Y', ds), using 256 GB of memory, is <5 = 4.5. 
This value is only slightly larger than the largest distance 
between the consecutive states and, since the diameter of 
the subsampled point cloud is 99.5, the relationship be¬ 
tween the length scale of the smallest detectable feature 
and the length scale of the diameter of the point cloud 
is still sufficient to resolve the geometry of the dynamics. 
The resulting persistence diagrams PP){Y' ^dg) are shown 
in Figure 

As shown by PDo(F', ds), Figure [T^ a), the point cloud 
merges to a single connected component at 0 = 4.5. This 


indicates that the sample of the trajectory is not broken 
into different pieces separated from each other. Since the 
maximum consecutive distance between any two points in 
Y is 4, the loop along which the data is organized should 
be present for 6 — 2. However, after subsampling, it is 
possible that the loop will not be born until 9 = 2 + S. 
Looking at the diagram PDiiY',dB) in Figure [l^b), we 
see that it contains a dominant point at (4.5,27.75), and 
so the loop was indeed born before 9 = 2 + 5. This is 
the loop along which the point cloud is organized. Now, 
there is another point, (12.5,26) S PDi(Y', 63 ), with life 
span 13.5. This point corresponds to a secondary struc¬ 
ture of the orbit. Indeed, it can be seen from the distance 
matrix for the d-sparse, d-dense subsample (not shown for 
brevity) that the part of the orbit corresponding to the 
fast dynamics (missing for the slow sampling rate) revis¬ 
its very similar states before continuing along the main 
loop. However, the development of more sensitive tools is 
required to fully understand these secondary features. 


We now turn our attention to the differences be¬ 
tween the persistence diagrams of the original point cloud 
Y and its subsample Y'. Theorem 7.3 implies that 
c?b(PD(F, ds), PD(y', ds)) < d, and so there exists a bi- 
jection between the points in PD(y, ds) and PD{Y'^ds) 
such that the distance between matched points is less than 
4.5. According to Remark |7.4[ for the dominant point 
(4.5,27.75) G PDi(y',dB), there is exactly one corre¬ 
sponding point in PDi(y,dB). This point is the unique 
point in PDi(y,dB) that lies inside of the shaded box 
touching the point (4.5,27.75), see Figure [T^b). The 
same is true for the other dominant point. Moreover, 
there are no points in PDi(y, ds) outside of the shaded 
regions. Points in PDi{Y,dB) that do not correspond to 
the off-diagonal points in PDi(F',d_B) can appear only 
51^/2 « 3.18 far away from the diagonal. 
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11. Conclusion 

We have shown how persistent homology can be used to 
identify equilibria and study periodic dynamics, and how 
this method is particularly natural when solutions must be 
identified that lie on a group orbit. We study two regimes 
in Kolmogorov flow: chaotic dynamics due to the appear¬ 
ance of unstable fixed points, and a periodic flow that ex¬ 
hibits drift in a direction of continuous symmetry. We 
also study an almost-periodic orbit from Rayleigh-Benard 
convection. We solve for the unstable equilibria in the 
first case and sample the periodic orbits in the other two 
cases, and use persistent homology to project these solu¬ 
tions to the space of persistence diagrams. We provide 
theoretical results that show this projection is stable with 
respect to numerical errors and discuss how the projec¬ 
tion naturally identifies symmetry-related solutions. We 
give three different metrics on the space of persistence di¬ 
agrams that can be used to study pattern evolution on 
large versus small spatial scales, and how these metrics can 
be used to estimate numerical error in the space of per¬ 
sistence diagrams. We develop an intuition for studying 
dynamics in the space of persistence diagrams by looking 
at point clouds in two-dimensional Euclidean space, and 
discuss methods for determining if a continuous trajectory 
has been sampled densely enough to resolve the underly¬ 
ing dynamics, as well as mathematical methods used to 
address issues associated with computing on large sample 
sets. We demonstrate the efficacy of these methods on 
Kolmogorov flow and Rayleigh-Benard convection, com¬ 
paring our methods to traditional Fourier methods where 
appropriate. Our results show that the geometry of the 
dynamics are recovered in each case. For Rayleigh-Benard 
convection in particular, we show that the dynamics are 
recovered even after truncating the simulated data to an 
8 -bit temperature field, and so this approach is suitable 
for studying data collected experimentally, rather than nu¬ 
merically. Also for this flow, we recover more subtle as¬ 
pects of the geometry in the space of persistence diagrams. 
In summary, we have shown that this method is both ro¬ 
bust to noise and sensitive to more complicated dynam¬ 
ics, and that it is appropriate for studying dynamics on 
datasets obtained experimentally. Our ongoing research 
will further refine these tools. 
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Appendix A. Homology of Sets on a Torus 

Topologically, a torus T = x 5'^ is a closed surface de¬ 
fined as the product of two circles. It can be also described 


as a quotient of the Cartesian plane under the identifica¬ 
tions {x,y) ^ {x + l,y) ~ {x,y + l). The homology groups 
of T are given by 


Hn{T) 


Z if n = 0,2, 

I? if n = 1 , and 
0 otherwise. 


Intuitively, this means that T has a single connected com¬ 
ponent (n = 0), two independent loops (n = 1), and a 
single cavity (n = 2). For a more detailed treatment of 
the following material, see See Hatcher, Ch 0 for a ref¬ 
erence to homotopies of maps, and Hatcher Ch 1 for a 
reference on identifying independent loops in a space, or 
the notion of the fundamental group. 

In this section, we explain the notion of independent 
loops of subsets of a torus. By a loop, we mean a continu¬ 
ous path 7 : [0,1] —>■ T such that 7 ( 0 ) = 7 ( 1 ). We will also 
be using the notion of a homotopy of loops, which can be 
thought of as deforming one loop continuously to another. 
More precisely, a homotopy of loops from a loop 70 to a 
loop 7 i is a continuous function F : [0,1] x [0,1] —)■ T 
such that F(s, 0 ) = 7 o(s) and ^( 5 , 1 ) = 7 i(s) for all 
s e [0,1], and F(0,t) = F(l,t) for all t G [0,1]. De¬ 
fine —"f(t) := 7(1 — t), which runs the loop 7 backwards 
in time, and define n"f{t) := 7 ([ntJ), which traverses the 
loop 7 n times. Finally, given two loops ai and 02 , we can 
form their sum 01 - 1-02 by taking a path d : [ 0 , 1 ] —?► T such 
that d( 0 ) = oi( 0 ) and (5(1) = 02 ( 0 ) and form the loop 


(oi -I- a2)(t) = 


01 (4t) [0,1/4] 

S(4t - 1) :tG [1/4,1/2] 
02(4t-2) :tG[l/2,3/4] 

-S(4t-3) :tG [3/4,1]. 


Algebraically, this can be written as oi -I- (5 -I- 02 — d = oi -|- 
02 . We say that a loop 7 is independent of a collection of 
loops oi,..., o/c if there does not exist a homotopy of loops 
from 7 to a. linear combination of the loops oi,..., o^. 


Figure 19 shows eight subsets {W}.^p of a torus. Note 


that Xi C Xj for i < j, and the sets can be considered 
as sub-level sets of some scalar function /. We will now 
examine each set and identify the independent loops in 
each. 

The set Xq is contractable. Hence, every loop inside Xq 
can be deformed to a point inside of Xq. This means that 
there is no independent (nontrivial) loop present in this 
set. 

The set Xi cannot be contracted to a point. It forms 
a band that wraps around the torus. There are many 
different loops (wrapping once around the torus from left 
to right in the picture) inside of this band. However, we 
can choose a single loop ai that represents all of them; 
every other loop can be either continuously deformed to a 
linear combination of the loop oi, or contracted to a point. 
Similarly, the set X 2 contains two independent loops. 

The set A 3 is formed by linking the horizontal bands 
present in A 2 . The loops 0:1 and 0:2 are still independent 
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in X3 (one cannot be deformed to the other inside X3). It 
might seem that there is a new independent loop, 7. How¬ 
ever, this is not case because 7 can be deformed (inside of 
X3) to the union of the black lines corresponding to ai, 02 
and < 5 . After this deformation, the loop traverses S twice: 
the right part of the deformed loop traverses S from the 
top to the bottom, and the left part in the opposite direc¬ 
tion. Algebraically, the deformed loop can be expressed 
as oi — (5 — 02 + <5 = «! — 02- This shows that 7 can be 
deformed to a linear combination of the loops oi and 02- 
Thus, 7 is not a new independent loop. 

The set X4, obtained from X3 by adding another link 
between the horizontal bands, contains a new indepen¬ 
dent loop, /?!, consisting of the edges 5i,(52,^3 and S 4 ( 
/ 3 i = i 5 i -I- 1^2 -I- (Js -I- ^4). This means that the loop /?i can¬ 
not be deformed inside of X4 to a linear combination of 
the loops Qfi and 0:2 ■ Again, the loop 7 is not indepen¬ 
dent from oi, a 2 , and / 3 i because it can be perturbed to 
ai-\-5^—a2+54—a2+5i+52 = ai—a2+/3i) which is a linear 
combination of the loops 0:1,02, and / 3 i. Therefore, there 
are three independent loops in this case. Adding another 
link between the horizontal bands creates another inde¬ 
pendent loop. Hence, the number of independent loops 
for two bands with n links is n -I- 1 . 

Alternatively, we can view the set X^ as a single band 
with one puncture, and X4 as a single band with two punc¬ 
tures. The number of independent loops is n -I- 1 , where n 
is the number of punctures, and the extra loop is generated 
by the band. 

Due to the identification (x,y) ~ {x,y + 1 ), the set X5 
contains another link between the horizontal bands. This 
band creates another puncture. In Figure [I^f), this punc¬ 
ture seems to have four distinct components (white blocks 
in the corners). However, under the boundary identifica¬ 
tion, they correspond to a single component. Therefore, 
there are four independent loops. 

The independent loops start disappearing as the punc¬ 
tures are hlled in. The set Xq contains a single puncture, 
and according to the previous argument, there are two in¬ 
dependent loops, a and j 3 . In this case, the loop 7 can 
be deformed to a point inside of the set Xq. Finally, the 
set Xr = T = X contains two independent loops 
corresponding to the two copies of that generate the 
torus. 
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Figure 19: Different subsets of the torus, (a) The set Xq does not contain any independent loops because any loop in Xq can be deformed to 
a point inside of Xq. (b) The set Xi contains a single independent loop oi. All the other loops can be either deformed to oi or to a point 
inside of Xi. (c) The set X 2 has two independent loops qi and 02 - (d) The loop 7 is not independent in X^ because it can be deformed to 
the linear combination of the loops ol\ and 0 : 2 . (e) Adding an extra link creates a new independent loop /3i in A ’ 4 . Again 7 is not independent 
because it can be deformed to a linear combination of the other loops, (f) The set X^ is produced by adding another link which produces 
one new loop, (g) The loop 7 in Xq is not independent because it can be deformed to a point, (f) X 7 = T contains two independent loops 
corresponding to two copies of generating the torus. 
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